{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型融合-集成学习\n",
    "- [Voting投票器](#Voting投票器)\n",
    "- [Bagging](#Bagging)\n",
    "- [RandomForest随机森林](#RandomForest随机森林)\n",
    "- [AdaBoost](#AdaBoost)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 载入数据\n",
    "\n",
    "> 以印第安人糖尿病数据集为例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>preg</th>\n",
       "      <th>plas</th>\n",
       "      <th>pres</th>\n",
       "      <th>skin</th>\n",
       "      <th>test</th>\n",
       "      <th>mass</th>\n",
       "      <th>pedi</th>\n",
       "      <th>age</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>148</td>\n",
       "      <td>72</td>\n",
       "      <td>35</td>\n",
       "      <td>0</td>\n",
       "      <td>33.6</td>\n",
       "      <td>0.627</td>\n",
       "      <td>50</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>85</td>\n",
       "      <td>66</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>26.6</td>\n",
       "      <td>0.351</td>\n",
       "      <td>31</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>183</td>\n",
       "      <td>64</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>23.3</td>\n",
       "      <td>0.672</td>\n",
       "      <td>32</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>89</td>\n",
       "      <td>66</td>\n",
       "      <td>23</td>\n",
       "      <td>94</td>\n",
       "      <td>28.1</td>\n",
       "      <td>0.167</td>\n",
       "      <td>21</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>137</td>\n",
       "      <td>40</td>\n",
       "      <td>35</td>\n",
       "      <td>168</td>\n",
       "      <td>43.1</td>\n",
       "      <td>2.288</td>\n",
       "      <td>33</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   preg  plas  pres  skin  test  mass   pedi  age  class\n",
       "0     6   148    72    35     0  33.6  0.627   50      1\n",
       "1     1    85    66    29     0  26.6  0.351   31      0\n",
       "2     8   183    64     0     0  23.3  0.672   32      1\n",
       "3     1    89    66    23    94  28.1  0.167   21      0\n",
       "4     0   137    40    35   168  43.1  2.288   33      1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "items = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']\n",
    "raw_data = pd.read_csv('pima-indians-diabetes.data.csv', names=items)\n",
    "raw_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 0], dtype=int64)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看 class列的数据类别\n",
    "raw_data['class'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Voting投票器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 引入模型选择器\n",
    "from sklearn import model_selection\n",
    "# 引入不同的分类器\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.svm import SVC\n",
    "# 引入 投票器 分类\n",
    "from sklearn.ensemble import VotingClassifier\n",
    "# 消除 warning\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.构建K折交叉验证数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "KFold(n_splits=5, random_state=2021, shuffle=False)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kfold = model_selection.KFold(n_splits=5, random_state=2021)\n",
    "kfold"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.构建投票器的子模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimators = []\n",
    "model_01 = LogisticRegression()\n",
    "model_02 = DecisionTreeClassifier()\n",
    "model_03 = SVC()\n",
    "\n",
    "estimators.append(('LR', model_01))\n",
    "estimators.append(('DC', model_02))\n",
    "estimators.append(('SVM', model_03))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.构建Voting投票器融合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VotingClassifier(estimators=[('LR', LogisticRegression()),\n",
       "                             ('DC', DecisionTreeClassifier()), ('SVM', SVC())])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vote_ens = VotingClassifier(estimators=estimators)\n",
    "vote_ens"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.交叉验证计算模型得分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K折交叉验证得分=> [0.76623377 0.70779221 0.79220779 0.83660131 0.75816993]\n",
      "voting投票器得分=> 0.7722010016127664\n"
     ]
    }
   ],
   "source": [
    "# 特征值和标签值\n",
    "array = raw_data.values\n",
    "feature = array[:, 0:8]\n",
    "target = array[:, 8]\n",
    "res = model_selection.cross_val_score(vote_ens, feature, target, cv=kfold)\n",
    "print(\"K折交叉验证得分=>\", res)\n",
    "print(\"voting投票器得分=>\", res.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bagging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import BaggingClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.创建基学习器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DecisionTreeClassifier()"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt = DecisionTreeClassifier()\n",
    "dt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.构建Bagging融合器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=50,\n",
       "                  random_state=2021)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bagging_ens = BaggingClassifier(base_estimator=dt, n_estimators=50, random_state=2021)\n",
    "bagging_ens"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.交叉验证计算模型得分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K折交叉验证得分=> [0.75974026 0.71428571 0.80519481 0.83006536 0.75816993]\n",
      "Bagging融合模型得分=> 0.7734912146676852\n"
     ]
    }
   ],
   "source": [
    "bag_res = model_selection.cross_val_score(bagging_ens, feature, target, cv=kfold)\n",
    "print(\"K折交叉验证得分=>\", bag_res)\n",
    "print(\"Bagging融合模型得分=>\", bag_res.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RandomForest随机森林\n",
    "> 基于 Bagging 思想的集成学习模型，基学习器是CART回归分类树"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.构建随机森林模型\n",
    "- 设置回归分类树的数量为50；\n",
    "- 设置每棵树的最大特征数为5；"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RandomForestClassifier(max_features=6, n_estimators=50)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rfc = RandomForestClassifier(n_estimators=50, max_features=6)\n",
    "rfc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.交叉验证计算模型得分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K折交叉验证得分=> [0.75324675 0.70779221 0.76623377 0.82352941 0.73856209]\n",
      "Bagging融合模型得分=> 0.7578728461081402\n"
     ]
    }
   ],
   "source": [
    "rfc_res = model_selection.cross_val_score(rfc, feature, target, cv=kfold)\n",
    "print(\"K折交叉验证得分=>\", rfc_res)\n",
    "print(\"Bagging融合模型得分=>\", rfc_res.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### AdaBoost\n",
    "> 基于 boosting 思想的集成学习模型，基学习器默认是深度为1的DecisionTreeClassifier，即回归分类树"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import AdaBoostClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.构建AdaBoost模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AdaBoostClassifier(random_state=2021)"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adaboost_ens = AdaBoostClassifier(n_estimators=50, random_state=2021)\n",
    "adaboost_ens"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.交叉验证计算模型得分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K折交叉验证得分=> [0.74025974 0.66883117 0.76623377 0.77777778 0.77124183]\n",
      "Bagging融合模型得分=> 0.7448688566335625\n"
     ]
    }
   ],
   "source": [
    "adaboost_res = model_selection.cross_val_score(adaboost_ens, feature, target, cv=kfold)\n",
    "print(\"K折交叉验证得分=>\", adaboost_res)\n",
    "print(\"Bagging融合模型得分=>\", adaboost_res.mean())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
